이 명령어 집합 아키텍처 (ISA) 소프트웨어와 하드웨어 간의 기본 계약 역할을 합니다. 프로그래머가 볼 수 있는 상태와 프로세서가 실행하는 구체적인 연산을 정의합니다. 그리고 Y86-64 ISA x86-64의 교육용 하위 집합이며, 복잡한 CISC 설계를 더 쉽게 다룰 수 있는 모델로 단순화하면서도 레지스터 중심의 절차 연결 방식을 유지합니다.
1. 프로그래머가 보는 상태
상태에는 레지스터 파일 (RF) 15개의 레지스터를 포함하며, 조건 코드 (CC) 흐름 제어를 위해 사용되며, 프로그램 카운터 (PC)그리고 상태 코드 (Stat) 정상 작동 (AOK), 정지 (HLT), 또는 오류 (ADR/INS)를 나타냅니다.
2. CISC와 RISC의 특성
x86-64은 전형적인 CISC이지만, Y86-64는 고정 길이 인코딩 과 엄격한 로드/스토어 아키텍처에 따라 메모리는 특정 이동 명령어를 통해만 접근됩니다. 예를 들어 rmmovq rA, D(rB).
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
Modify the sum function (Figure 4.6) to implement absSum using a conditional jump. Which approach is most architecturally sound for Y86-64?
Using jge to skip a subq instruction that negates the value.
Using a call to a separate absolute value function.
Using a memory-to-memory comparison.
Changing the status code to INS if the value is negative.
✅ Correct!
In Y86-64, we test the value (andq %r10, %r10) and then use jge to jump past the subtraction logic if the value is positive.❌ Incorrect
Y86-64 does not support memory-to-memory comparisons or hardware-level status changes for logic control.QUESTION 2
When implementing absSum with conditional move (cmovXX), how do we handle the sign inversion?
Subtract the value from zero in a temporary register, then cmovl the negative result back.
Use the iaddq instruction to flip the bits.
Y86-64 performs sign inversion automatically during mrmovq.
Conditional moves cannot be used for arithmetic logic.
✅ Correct!
We compute -x (e.g., using subq) and then use cmovl to replace x with its negative if the original was less than zero.❌ Incorrect
cmovXX only moves data; it does not perform arithmetic during the move itself.QUESTION 3
What is the byte encoding for the sequence: irmovq $15, %rbx; rrmovq %rbx, %rcx? (Starting at 0x100)
0x100: 30 f3 0f 00 00 00 00 00 00 00; 0x10a: 20 31
0x100: 30 3f 0f 00; 0x104: 20 13
0x100: 60 31; 0x102: 30 f3
0x100: 70 00; 0x102: 20 31
✅ Correct!
irmovq is 10 bytes (30 F rB ValC) and rrmovq is 2 bytes (20 rA rB).❌ Incorrect
Recall that Y86-64 instructions are fixed-length for specific types; irmovq always takes 10 bytes including the 8-byte constant.QUESTION 4
Determine the HCL code for the control signal mem_write in a SEQ processor.
bool mem_write = icode in { IRMMOVQ, IPUSHQ, ICALL };
bool mem_write = icode in { IMRMOVQ, IPOPQ, IRET };
bool mem_write = (valE == valM);
bool mem_write = stat == AOK;
✅ Correct!
Only instructions that push to stack or store to memory (rmmovq, pushq, call) trigger a memory write.❌ Incorrect
IMRMOVQ and IPOPQ perform memory reads, not writes.QUESTION 5
In the PIPE implementation, when should the signal E_bubble be set?
On mispredicted branches or load-use hazards.
Every time the PC is updated.
Only when the processor hits a HALT instruction.
When the Register File is being read.
✅ Correct!
E_bubble clears the execute stage to handle branch mispredictions or to stall for a load-use hazard.❌ Incorrect
Bubbling is a specific hazard management technique, not a standard part of every cycle.Case Study: Architectural Optimization and Logic
Advanced Y86-64 Implementation Details
You are tasked with extending the Y86-64 design. Consider the introduction of the iaddq instruction and the performance limits of a pipelined system with $k$ stages and overhead $T_{overhead}$.
Q
1. [Writing Task] Rewrite the Y86-64 sum function of Figure 4.6 to make use of the iaddq instruction. (Output: ~14 lines).
Solution:
Model Solution: sum: irmovq $0, %rax # 1: sum = 0 andq %rsi, %rsi # 2: set CC jmp test # 3: start test loop: mrmovq (%rdi), %rdx # 4: get *start addq %rdx, %rax # 5: sum += *start iaddq $8, %rdi # 6: start++ (Optimization!) iaddq $-1, %rsi # 7: count-- (Optimization!) test: jg loop # 8: if count > 0, loop ret # 9: return (Note: This removes the need for registers %r8 and %r9 previously used to store constants 8 and 1.)
Model Solution: sum: irmovq $0, %rax # 1: sum = 0 andq %rsi, %rsi # 2: set CC jmp test # 3: start test loop: mrmovq (%rdi), %rdx # 4: get *start addq %rdx, %rax # 5: sum += *start iaddq $8, %rdi # 6: start++ (Optimization!) iaddq $-1, %rsi # 7: count-- (Optimization!) test: jg loop # 8: if count > 0, loop ret # 9: return (Note: This removes the need for registers %r8 and %r9 previously used to store constants 8 and 1.)
Q
2. Write HCL code for a circuit that selects the median of word inputs A, B, and C.
Solution:
word median = [ (A <= B && B <= C) || (C <= B && B <= A) : B; (B <= A && A <= C) || (C <= A && A <= B) : A; 1 : C; ];
word median = [ (A <= B && B <= C) || (C <= B && B <= A) : B; (B <= A && A <= C) || (C <= A && A <= B) : A; 1 : C; ];
Q
3. As the number of pipeline stages $k$ goes to infinity, what happens to the throughput?
Solution:
Throughput = 1 / (T/k + T_overhead). As k approaches infinity, the term T/k vanishes, and the throughput approaches a limit of 1 / T_overhead. This demonstrates that pipeline overhead eventually becomes the bottleneck for processor speed.
Throughput = 1 / (T/k + T_overhead). As k approaches infinity, the term T/k vanishes, and the throughput approaches a limit of 1 / T_overhead. This demonstrates that pipeline overhead eventually becomes the bottleneck for processor speed.